AITopics | colin raffel

Collaborating Authors

colin raffel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Getting Your Indices in a Row: Full-Text Search for LLM Training Data for Real World

Marinas, Ines Altemir, Kucherenko, Anastasiia, Sternfeld, Alexander, Kucharavy, Andrei

arXiv.org Artificial IntelligenceOct-13-2025

The performance of Large Language Models (LLMs) is determined by their training data. Despite the proliferation of open-weight LLMs, access to LLM training data has remained limited. Even for fully open LLMs, the scale of the data makes it all but inscrutable to the general scientific community, despite potentially containing critical data scraped from the internet. In this paper, we present the full-text indexing pipeline for the Apertus LLM training data. Leveraging Elasticsearch parallel indices and the Alps infrastructure, a state-of-the-art, highly energy-efficient arm64 supercluster, we were able to index 8.6T tokens out of 15.2T used to train the Apertus LLM family, creating both a critical LLM safety tool and effectively an offline, curated, open web search engine. Our contribution is threefold. First, we demonstrate that Elasticsearch can be successfully ported onto next-generation arm64-based infrastructure. Second, we demonstrate that full-text indexing at the scale of modern LLM training datasets and the entire open web is feasible and accessible. Finally, we demonstrate that such indices can be used to ensure previously inaccessible jailbreak-agnostic LLM safety. We hope that our findings will be useful to other teams attempting large-scale data indexing and facilitate the general transition towards greener computation.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.09471

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.16)

Genre: Research Report > New Finding (0.66)

Industry:

Materials > Chemicals (1.00)
Information Technology (0.93)
Health & Medicine (0.68)
(3 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

What Matters for Model Merging at Scale?

Yadav, Prateek, Vu, Tu, Lai, Jonathan, Chronopoulou, Alexandra, Faruqui, Manaal, Bansal, Mohit, Munkhdalai, Tsendsuren

arXiv.org Artificial IntelligenceOct-4-2024

Model merging aims to combine multiple expert models into a more capable single model, offering benefits such as reduced storage and serving costs, improved generalization, and support for decentralized model development. Despite its promise, previous studies have primarily focused on merging a few small models. This leaves many unanswered questions about the effect of scaling model size and how it interplays with other key factors -- like the base model quality and number of expert models -- , to affect the merged model's performance. This work systematically evaluates the utility of model merging at scale, examining the impact of these different factors. We experiment with merging fully fine-tuned models using 4 popular merging methods -- Averaging, Task~Arithmetic, Dare, and TIES -- across model sizes ranging from 1B-64B parameters and merging up to 8 different expert models. We evaluate the merged models on both held-in tasks, i.e., the expert's training tasks, and zero-shot generalization to unseen held-out tasks. Our experiments provide several new insights about model merging at scale and the interplay between different factors. First, we find that merging is more effective when experts are created from strong base models, i.e., models with good zero-shot performance. Second, larger models facilitate easier merging. Third merging consistently improves generalization capabilities. Notably, when merging 8 large expert models, the merged models often generalize better compared to the multitask trained models. Fourth, we can better merge more expert models when working with larger models. Fifth, different merging methods behave very similarly at larger scales. Overall, our findings shed light on some interesting properties of model merging while also highlighting some limitations. We hope that this study will serve as a reference point on large-scale merging for upcoming research.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.03617

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Virginia (0.04)
North America > United States > North Carolina (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

BloombergGPT: A Large Language Model for Finance

Wu, Shijie, Irsoy, Ozan, Lu, Steven, Dabravolski, Vadim, Dredze, Mark, Gehrmann, Sebastian, Kambadur, Prabhanjan, Rosenberg, David, Mann, Gideon

arXiv.org Artificial IntelligenceDec-21-2023

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.

international conference, language model, news headline talk, (14 more...)

arXiv.org Artificial Intelligence

2303.17564

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > Dominican Republic (0.04)
(22 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Derivative Free Weight-space Ensembling

Ninalga, Dean

arXiv.org Artificial IntelligenceJul-26-2023

Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends outperforming the standard pretrain-finetune approach.

machine learning, natural language, target task, (18 more...)

arXiv.org Artificial Intelligence

2307.03506

Country: Asia > Middle East > UAE (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Pop2Piano : Pop Audio-based Piano Cover Generation

Choi, Jongho, Lee, Kyogu

arXiv.org Artificial IntelligenceApr-1-2023

Piano covers of pop music are enjoyed by many people. However, the task of automatically generating piano covers of pop music is still understudied. This is partly due to the lack of synchronized {Pop, Piano Cover} data pairs, which made it challenging to apply the latest data-intensive deep learning-based methods. To leverage the power of the data-driven approach, we make a large amount of paired and synchronized {Pop, Piano Cover} data using an automated pipeline. In this paper, we present Pop2Piano, a Transformer network that generates piano covers given waveforms of pop music. To the best of our knowledge, this is the first model to generate a piano cover directly from pop audio without using melody and chord extraction modules. We show that Pop2Piano, trained with our dataset, is capable of producing plausible piano covers.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.00895

Country: Asia > South Korea > Seoul > Seoul (0.05)

Genre: Research Report > New Finding (0.47)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fine-Tashkeel: Finetuning Byte-Level Models for Accurate Arabic Text Diacritization

Al-Rfooh, Bashar, Abandah, Gheith, Al-Rfou, Rami

arXiv.org Artificial IntelligenceMar-25-2023

Most of previous work on learning diacritization of the Arabic language relied on training models from scratch. In this paper, we investigate how to leverage pre-trained language models to learn diacritization. We finetune token-free pre-trained multilingual models (ByT5) to learn to predict and insert missing diacritics in Arabic text, a complex task that requires understanding the sentence semantics and the morphological structure of the tokens. We show that we can achieve state-of-the-art on the diacritization task with minimal amount of training and no feature engineering, reducing WER by 40%. We release our finetuned models for the greater benefit of the researchers in the community.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2303.14588

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning

Bari, M Saiful, Zhang, Aston, Zheng, Shuai, Shi, Xingjian, Zhu, Yi, Joty, Shafiq, Li, Mu

arXiv.org Artificial IntelligenceDec-21-2022

Pre-trained large language models can efficiently interpolate human-written prompts in a natural way. Multitask prompted learning can help generalization through a diverse set of tasks at once, thus enhancing the potential for more effective downstream fine-tuning. To perform efficient multitask-inference in the same batch, parameter-efficient fine-tuning methods such as prompt tuning have been proposed. However, the existing prompt tuning methods may lack generalization. We propose SPT, a semi-parametric prompt tuning method for multitask prompted learning. The novel component of SPT is a memory bank from where memory prompts are retrieved based on discrete prompts. Extensive experiments, such as (i) fine-tuning a full language model with SPT on 31 different tasks from 8 different domains and evaluating zero-shot generalization on 9 heldout datasets under 5 NLP task categories and (ii) pretraining SPT on the GLUE datasets and evaluating fine-tuning on the SuperGLUE datasets, demonstrate effectiveness of SPT.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.10929

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(6 more...)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Happy AI New Year! Global Researchers Reflect on 2019, Talk Trends for 2020

#artificialintelligenceJan-3-2020, 04:35:35 GMT

The year 2019 saw unprecedented growth in AI research, development and deployment. Great technical progress has been achieved in image recognition, image generation, natural language understanding and other fields; while challenges remain with data management, efficiency measurement, computational capacity and other issues. To welcome 2020 with some fresh AI perspectives, Synced spoke with global researchers from Google Brain, Sony AI, Alibaba affiliate Ant Financial (formerly known as Alipay), Israel-based AI processor company Habana (recently acquired by Intel), Russian tech giant Yandex, Vietnam's newly established research lab VinAI Research, French deep learning inference acceleration startup Mipsology, and China-based remote sensing data platform TerraQuanta. Colin Raffel, Senior Research Scientist, Google Brain In 2019 the community made huge progress on learning from limited labels. MixMatch, UDA, S4L, and ReMixMatch produced huge gains on standard semi-supervised learning benchmarks.

application, learning, privacy, (11 more...)

#artificialintelligence

Country:

Asia > Vietnam (0.25)
Asia > Middle East > Israel (0.25)
Asia > China (0.25)

Industry: Information Technology > Security & Privacy (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Happy AI New Year! Global Researchers Reflect on 2019, Talk Trends for 2020

#artificialintelligenceJan-2-2020, 17:21:26 GMT

application, learning, privacy, (11 more...)

#artificialintelligence

Country:

Asia > Vietnam (0.25)
Asia > Middle East > Israel (0.25)
Asia > China (0.25)

Industry: Information Technology > Security & Privacy (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Colin Raffel - Doing Strange Things with Attention - AI With The Best October 14-15, 2017

#artificialintelligenceJan-12-2018, 04:37:50 GMT

AI With The Best hosted 50 speakers and hundreds of attendees from all over the world on a single platform on October 14-15, 2017. The platform held live talks, Insights/Questions pages, and bookings for 1-on-1s with speakers. Colin is a Research Scientist (formerly a resident) at Google Brain, where he is working on unsupervised learning, machine learning security, and models for sequential data. He did his PhD at Columbia University in LabROSA, supervised by Dan Ellis. He also has a Master's from Stanford University's CCRMA and a Bachelor's from Oberlin College.

artificial intelligence, machine learning, strange thing, (4 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback